Segment selection considering local degradation of naturalness in concatenative speech synthesis
نویسندگان
چکیده
In this paper, we investigate the effect of using a novel cost, RMS (Root Mean Square) cost, for segment selection for concatenative Text-to-Speech. The RMS cost is affected not only by the total degradation of naturalness but also by the local degradation of naturalness. From the results of experiments comparing this approach with segment selection based on a conventional average cost, it is found that (1) in the segment selection based on the RMS cost a larger number of concatenations causing slight local degradation are performed in order to avoid concatenations causing greater local degradation and (2) the effect of the RMS cost has little dependence on the size of the corpus. Moreover, we clarify that the naturalness of synthetic speech can be slightly improved by utilizing the RMS cost.
منابع مشابه
Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis
ABSTRACT In segment selection for concatenative Text-to-Speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic spee...
متن کاملAn evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis
In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maxi...
متن کاملOptimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations
This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملProsody-based unit selection for Japanese speech synthesis
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be di cult to realize natural prosody. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003